Estimating the Determinants of Health Literacy for Policy Prioritisation

UCL CDS Symposium on Data Science in Public Health

Nathan Green

Department of Statistical Science, UCL

Outline

  • Background
  • Problems
  • Solutions
    • Local level estimation
    • Predictive comparisons
    • Prioritisation with SUCRA
  • Main results
  • Sensitivity analysis
  • Conclusion

Resources

Slides and code here: github.com/n8thangreen/data-science-in-health-talk

Background

Background

  • Project title: Assessed the factors that determine health literacy and the size of their influence/impact for Newham

  • Health literacy is broadly defined as the ability to access, understand, appraise, and communicate health information, enabling individuals to engage in healthcare and maintain good health throughout their lives.

  • Focusses on Newham, a diverse borough in East London that faces unique challenges
  • Identified as having some of the lowest levels of health literacy in the UK by University of Southampton (https://healthliteracy.geodata.uk/)

Previous method: Synthetic estimation

  • Weighted Logistic Regression with Synthetic Estimation (Laursen et al. (2016))
    • Frequentist single-level regression with poststratification
  • Used in geography (Gonzalez (1973); Rao and Molina (2015))
  • Can be viewed as the simpler predecessor to Multilevel Regression with Post-stratification (MRP)
    • Ignores any unique local factors
    • MRP includes shrinkage via random effects

Some (rough) comparisons

Small Area Estimation (SAE) HTA / Statistics Method
Weighted Logistic Regression with Synthetic Estimation

\(\longrightarrow\) Multilevel Regression with Post-stratification (MRP)
Linear Plug-In Model
(Equivalent to Regression-Synthetic Estimator at Unit Level)
\(\longrightarrow\) Simulated Treatment Comparison (STC)
Residual-adjusted synthetic estimation \(\longrightarrow\) Targeted Maximum Likelihood Estimation (TMLE)
(in causal inference)

Problem

  • What are the ‘drivers’ of health literacy?
  • Can we quantify them? Specific to Newham?
  • What would happen to health literacy if we were to intervene to effect one of these?

Data

  • Newham Residents Survey 2023 (NRS)

    • Periodic survey, usually every two years
    • Detailed information on views, experiences, and needs of Newham residents
    • Covers satisfaction with local services, community safety, health and well-being, housing, and employment
  • Skills for Life (SfL) Survey 2011

    • Comprehensive computer-based assessment conducted by the ONS to evaluate the skills of literacy, numeracy, and ICT
    • Total 7230 adults in England
  • Additional data

    • Labour Force Survey (LFS) / Annual Population Survey (APS)
    • UK Programme for the International Assessment of Adult Competencies (PIAAC) 2023
    • Skills for Life Survey 2003
    • UK Census 2011, 2021

Health literacy definition

  • From Rowlands et al. (2015)
  • Sample of health materials, including
    • medicine labels
    • booklets
    • application forms
  • Covered themes of health promotion, managing illness, systems navigation and disease prevention
  • Assessed for literacy and numeracy complexity by education experts
  • SfL responses were mapped to the binary health literacy scale according to whether they are above or below threshold

Newham vs SfL profiles

Mutlilevel Regression and Post-stratification

The predicted probability defined as: \[ \hat{\pi}_i = \text{logit}^{-1} \left( \hat{\beta}_0 + \sum_{x} \hat{\beta}^{x}_{\gamma_x[i]} \right) \]

  • \(\hat{\beta}_0\) is the intercept, \(\hat{\beta}^{x}_{\gamma_x[i]}\) are coefficients for covariates \(x\)
    • age, sex, English language, white ethnicity, UK born, qualifications, income, job status, work role, home ownership
  • \(\gamma_x[i]\) represents the level or category for covariate \(x\) for individual \(i\).
  • IMD is included as multilevel random effects \(\beta^{\text{IMD}}_j \sim \text{N}(\mu_{\text{IMD}}, \sigma_{\text{IMD}}^2)\)
  • Prior distributions for fixed effects normal distributions centered at zero with modest variance
  • Half-normal priors are used for random effect standard deviations .

Mutlilevel Regression and Post-stratification

  • The health literacy probabilities for each demographic category (cell \(c\)) are weighted by their proportion in the actual Newham population

  • 11 covariates is 13,824 cells

  • Post-stratified estimate is: \[ \hat{\pi}^{\text{mrp}} = \sum_{c = 1}^{|\mathcal{S}|} w_c \hat{\pi}_{c} \]

  • \(\mathcal{S}\) is the set of all covariate combinations

  • \(N_c\) is the population frequency for cell \(c\)

  • \(N\) is the total population size

  • \(w_c = N_{c} / N\) are the combination weights

Predictive comparisons 🤔

  • Terminology borrow from Gelman and Pardoe (2007). Also called predicted change in probability
  • Previously, crops up in other fields e.g. Lee (1981) (covariance adjustment mean difference)
  • Like average treatment effects without the causal interpretation\[ \delta_u(u^{(1)}, u^{(2)}) = \frac{E(y \mid u^{(2)}) - E(y \mid u^{(1)})}{u^{(2)} - u^{(1)}} \]

Missing joint distributions

  • Raking / Iterative proportional fitting (IPF)
    • Adjust survey weights so that the sample distribution matches known population control totals (margins)
  • Census data \(\rightarrow\) Marginals
  • Labour Force Survey (LFS) \(\rightarrow\) Covariance structure
  • Overlap issues, non-representative
    • Data augmentation before IPF
    • Laplace smoothing after IPF
      • Like “zero cell” problem in meta-analyses

Priority ranking

  • Adopt Surface Under the Cumulative Ranking Curve (SUCRA)

    • Common in multiple-treatment meta-analysis
  • Percentage of the maximum possible cumulative rank an intervention can achieve

  • Providing a single value where a higher SUCRA indicates a better overall rank relative to others \[ \text{SUCRA}_{ij} = \sum_{r=1}^{n-1} P_{ijr} / (n-1), \]

  • where \(P_{ijr}\) is the cumulative probability for variable \(i\) at level \(j\) and rank \(r\)

  • The mean rank is \[ \mathbb{E}[\text{rank}(i,j)] = n - \sum_{r=1}^{n-1} P_{ijr}. \]

Main results 👍

Sensitivity analyses 👍

  • Concern was raised that the relationship between the covariates and the health literacy outcomes may have changed over time

  • So using SfL 2011 is not appropriate?

  • Suggested more recent survey data PIAAC 2023

    • Missing data in PIAAC
    • Impute using covariate relationship in SfL 2011
      • Weaker assumption than with outcome
  • Also fitted to SfL 2001 to see trend back in time

Covariate imputation

  • Covariate noise introduces Dilution or Attenuation, where signal is weakened
  • Uncongeniality is when there is a mismatch between the imputation and the main model
    • Not including outcome essentially drop association with imputed covariates
  • So trade-off between including outcome for congeniality and excluding because of temporal drift
  • Perform sensitivity analysis by including additional noise to outcome
    • Equivalent to influence of prior distribution in Bayesian model

Note

Figure from Hutcheon, Chiolero, and Hanley (2010)

Congeniality sensitivity in imputation

Conclusions

Thanks 🙏

References

Gelman, Andrew, and Iain Pardoe. 2007. “Average Predictive Comparisons for Models with Nonlinearity, Interactions, and Variance Components.” Sociological Methodology 37 (1): 23–51. https://doi.org/10.1111/j.1467-9531.2007.00181.x.
Gonzalez, Maria E. 1973. “Use and Evaluation of Synthetic Estimates.” In Proceedings of the Social Statistics Section, American Statistical Association, 33–42. American Statistical Association.
Hutcheon, Jennifer A, Arnaud Chiolero, and James A Hanley. 2010. “Random Measurement Error and Regression Dilution Bias.” BMJ 340. https://doi.org/10.1136/bmj.c2289.
Laursen, Kamilla R., Paul T. Seed, Joanne Protheroe, Michael S. Wolf, and Gill P. Rowlands. 2016. “Developing a Method to Derive Indicative Health Literacy from Routine Socio-Demographic Data.” Journal of Health Care Communications 1 (4): 1–9. https://doi.org/10.4172/2472-1654.100033.
Lee, James. 1981. “Covariance Adjustment of Rates Based on the Multiple Logistic Regression Model.” Journal of Chronic Diseases 34 (8): 415–26. https://doi.org/10.1016/0021-9681(81)90006-4.
Rao, J. N. K., and Isabel Molina. 2015. Small Area Estimation. 2nd ed. Wiley Series in Survey Methodology. John Wiley & Sons.
Rowlands, G, J Protheroe, J Winkley, et al. 2015. “A Mismatch Between Population Health Literacy and the Complexity of Health Information: An Observational Study.” British Journal of General Practice 65 (635): e379–86. https://doi.org/10.3399/bjgp15X685285.